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Critics of the use of test scores in college admissions cite the relatively low 
correlations between admission test scores and success in college, for example first- 
year grade point average (GPA). They point out that a typical correlation of .40 between 
the admission test score and end-of-first-year GPA indicates that the test score predicts 
only 16 percent of the variance in the first-year grades. Critics assert that the percent of 
variance is so low that colleges and universities place too much emphasis on test scores 
as predictors of college success (Vasquez and Jones, 2006; Kohn 2001; Sterenberg, 
Wagner, Williams, and Horvath, 1995; among others). 

On the other hand, defenders of using test scores in admission decisions note 
that some of the criticism may be unwarranted. For example, Sackett, Borneman, and 
Connelly (2008) outlined several reasons why test scores might be more effective 
predictors of college success than what is typically reported. First, "variance explained" 
may not be the best metric for interpreting the predictive ability of test scores. That is, if 
you convert "variance explained" to "differences in odds," test scores predict the 
likelihood of a subject being successful quite well. Second, most studies use a single 
dependent variable, first-year GPA, which is unreliable because of its differences across 
various types of institutions and because of differences in student course-taking 
patterns. Third, admission tests do a reasonably good job predicting grades beyond the 
first year of college in addition to predicting the performance on other dependent 
variables such as GRE scores, LSAT scores, doctoral degree achievement, and getting 
tenure (Kuncel & Hazlett, 2007; Lubinski, Benbow, Webb, and Bleske-Rechek, 2006; Vey 
et al., 2003). Fourth, concerns over using a restricted range or only examining the 
relationship between those students who take admission tests and students who 
actually go to college, was cited as a major factor in limiting the predictability of test 
scores (Sackett, Borneman, and Connelly, 2008). This final reason— restricted range— is 
the focus of the current study. 

What may be overlooked by the critics is that the correlations in question 
typically are calculated using a population of full-time enrolled, entering freshmen who 
completed a full year of college. In judging the value of the test in predicting college 
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success a more relevant population is that of all potential college students, specifically, 
that of all high school graduates. Clearly this is a more heterogeneous population than 
the one used in calculating the correlations. The implication of this situation is that the 
calculated correlations understate the more meaningful correlations for the 
unrestricted population. 

The population of high school graduates is initially restricted by the elimination 
of those who do not apply to college and next by those who apply but are not accepted 
and then by those who are accepted but do not enroll. Then there is the group that 
enrolls on a part-time basis and typically is not used in calculating the correlations of 
interest. Similarly, there is the further restriction from those who enroll on a full time 
basis, but do not complete the first year with a grade point average. 

Indicators of success in high school are also used as predictors of success in 
college. As a matter of fact, the usual finding is that success in high school is a better 
predictor of success in college than is admission test score (Hoffman, 2002; Munro, 
1981; Zheng et al., 2002; among others). Further, the combination of test scores with 
measures of success in high school provides a better prediction of success in college 
than either predictor used individually (Mathiasen, 1984; Wolfe & Johnson, 1995; 

Eimers & Pike, 1997; Noble & Sawyer, 1997; Flemming, 2002; Kim, 2002; Zwick & Sklar, 
2005; Geiser & Santelices, 2007; Linn, 1990). 

Purpose 

The purpose of this study is to illustrate techniques for correcting a correlation 
between a predictor of success in college (admission test score or indicator of high 
school performance) with a measure of success in college (one-year retention or first- 
year GPA) given the restricted variances in the population used to calculate the 
correlations. In other words, this study demonstrates procedures for estimating 
correlations in the unrestricted population (students who attend college and students 
who do not attend college) based upon correlations calculated for the restricted 
population (students who attend college). A secondary purpose is to set the foundation 
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for and stimulate additional studies designed to estimate these correlations in other 
unrestricted higher education and college student populations. This study focuses on 
correlations involving admission test scores, indicators of success in high school, and 
first-year college GPA. 



Formulas for Correcting Correlations 

Classical measurement theory provides the means to "correct" correlations for 
restrictions in the variances of the correlated variables. The restriction in the variance 
of a predictor variable is due to selection on that variable or on a related variable or to 
selection on more than one variable. In the case of college admissions, students may be 
selected on the basis of an admission test, an indicator of high school success, or some 
combination of the two. In correcting correlations for restriction of variance a 
distinction between explicit selection and implicit or incidental selection is made. In 
order to correct a correlation, the variance of a relevant variable in the unrestricted 
population must be known. Selection on the basis of a variable for which the variance in 
the unrestricted population is known is referred to as explicit selection. Selection on the 
basis of some other variable that is related to the selection variable and for which the 
unrestricted population variance is known is considered to be implicit or incidental 
selection. 

In this study, three cases and associated formulas for estimating correlations in 
unrestricted populations are described. These three formulas are the ones derived or 
discussed in the seminal literature and are the formulas that have the most application 
in the current research. In the end, two of the formulas are actually employed in 
correcting correlations in this study. Assumptions underlying the three formulas include 
the conventional ones of linearity and homoscedasticity. Also assumed is that the 
selection of the restricted population from the unrestricted population is as described 
for the formulas being used. 
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The three situations and formulas for each are noted below. In these formulas, s 
(standard deviation), s 2 (variance), and r (correlation) are statistics from the restricted 
population and S, S 2 , and R are the corresponding statistics from the unrestricted 
population; Xi is the predictor variable, X 2 is the variable being predicted, and X 3 is a 
third variable related to Xi. 

Case 1. In this case subjects are selected on the basis of X x and the values of 
Si, s 2 ,S 2 , and r 12 are known. Gullikson (1950) describes this as the case of incidental 
selection and derives the following formula: 

Ri2 = Vl-(l-r 1 2 2 )(s 2 /S 2 ) 

Gulliksen cites Kelly (1923), Garrett (1947), Guilford (1942), Crawford and Burnham 
(1946), and Thorndike (1947) as including this formula or equivalent versions of it. Also 
see Cureton (1951) and Guilford (1965). 

In predicting college success, the variance of X 2 , the success variable (e.g., first- 
year GPA), is not known in the unrestricted population of all high school graduates 
because all high school graduates did not complete the first year of college. 
Consequently, the Case 1 formula cannot be applied in this study. 



Case 2. In this case subjects are selected on the basis of X x and the values of Si, 
s 2 , Si, and r 12 are known. Gulliksen (1950) refers to this as the case of explicit selection on 
Xi and derives the following formula: 

S1T2 



^12 — 






2 2,2 2 2 
r !2 +S 1 S 1 r i 2 



Slightly different, but equivalent formulas are given by Guilford (1965) and Cureton 
(1951). Gulliksen (1950) indicates that this formula was first derived by Pearson (1903) 
and that slightly different versions of it have been given by Kelly (1923), Holzinger 
(1928), Thurstone (1931), Thorndike (1947), Crawford and Burnham (1946), and others. 

For example, if it is assumed that students are selected for admission on the 
basis of a predictor of college success, X 1; that the variance of the predictor variable in 
the unrestricted population of high school graduates. Si 2 , and the variances of the 
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predictor. Si 2 , and of the college success variable, s 2 2 , as well as the correlation between 
the two variables, r 12 , in the restricted range population are known, then the correlation 
between the predictor variable and the college success variable, R i2 , for the population 
of high school graduates can be estimated using this formula. 

A situation in which the Case 2 formula might be used is the one in which a test 
is given to a population of subjects and only those who score in the top, say, 40% on the 
test are selected for a program of instruction. At the end of the program the selected 
subjects are given an achievement test on the subject matter of the instruction. The 
correlation, r 12 , of the selection test, Xi, and the achievement test, X 2 , and the variances, 
s 3 2 and s 2 2 , of the two variables are known for the selected subjects, those in the 
restricted population. Also known is the variance. Si 2 , of all subjects who took the 
selection test, those in the unrestricted population. The Case 2 formula then would be 
used to estimate the correlation, Ri 2 , between the two tests for the population of 
subjects who took the selection test. This situation is illustrated by Berry and Sackett 
(2008) in a study involving students at 41 colleges. They found a correlation of .35 
between SAT (Verbal + Math) score and first-year GPA. The corrected correlation for 
the population of students who took the SAT was .47. These situations are similar to the 
one for which the Case 2 formula is used in this study. 

Case 3. In this case subjects are explicitly selected on the basis of X 3 and Xi is a 
third variable, related to X 3 , for which values are available for subjects in the restricted 
population. It is desired to estimate the correlation between X lt the incidental selection 
variable and the success variable, X 2 , for the unrestricted population. In this case the 
values of Si 2 , s 2 2 , s 3 2 , S 3 2 , r 12 , r 13 , and r 23 are known. The formula for this case, given by 
Gulliksen (1950), follows: 



r l2 - 


T3C3 + / 'l3 / '23 l 


(S 3 /S 3 ] 


1 


— r 2i r 23^ 


(S3 2 / .s 2 )j[l - 


4 + 41 
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Gulliksen (1950) indicates that variants of this formula appear in Pearson (1903) and 
Thorndike (1947). The formula or its equivalent also appears in Cureton (1951) and 
Guilford (1965). 
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For example, if it is assumed students are selected on the basis of one predictor 
of success in college, X 3 , that the variance of that variable in the unrestricted population 
is known, and that values of a second predictor variable, Xi, are available for the 
selected students, then the correlation of the second predictor variable and the success 
variable, X 2 , in the unrestricted population can be calculated from this formula. 

Gulliksen (1950) provides two additional three-variable formulas for estimating 
correlations in unrestricted populations if the variance of the incidental selection 
variables in the unrestricted populations is known. These formulas are not given or 
discussed here. Sackett and Yang (2006) provide descriptions of eleven cases in which 
estimates of correlations in unrestricted populations might be estimated from data for 
restricted populations. The three cases discussed above are included among the eleven. 
This paper (Sackett and Yang, 2006) includes a comprehensive review of the literature 
on contributions to the matter of estimating population correlations from samples that 
have been restricted due to any of several types of selection. It is recommended to 
anyone seeking to investigate further the topic of the present study. The presentation 
by Thorndike (1949) is frequently cited in the literature on the topic and also is 
recommended to those interested in pursuing it further. 

The Data and Their Application 

The data for this study come from a population of first-time freshmen who 
entered a major research university with moderately selective admission standards in 
the fall 2008 semester, whose high school class percentile rank was 50 or greater, who 
entered the fall semester as full-time, degree-seeking students, and who completed 
both semesters with complete data for the study variables. There are 3,668 students in 
this population. The variables collected for these students are: 

ACT-C - ACT Composite Score 

HSCPR - High School Class Percentile Rank 



7 

AIR 2010 Forum - Chicago, IL 




NHSCPR - Normalized High School Class Percentile Rank 
CCGPA - High School Core Course Grade Point Average 
FYGPA - Freshman Year Grade Point Average 



The HSCPRs are transformed into normalized values (NHSCPRs) by converting 
cumulative percentile values from a normal curve tables into z-values, transformed to 
values with a mean of 50 and a standard deviation of 10. The NHSCPR variable is used in 
the data analysis because it has more desirable statistical properties than the HSCPR 
one. The CCGPA is an average on a 4-point scale calculated from the high school 
academic core courses in English, mathematics, science, social science and fine arts. 

The other variables are as traditionally defined. 

Some statistics for this population are shown in Table 1. 



Table 1. Means, standard deviations, and variances for study 
variables for students in the study population, N = 3,668 



Statistic 




Variable 




ACT-C 


NHSCPR 


CCGPA 


FYGPA 


Mean 


25.65 


60.13 


3.50 


3.05 


s 


3.71 


6.32 


0.38 


0.67 


2 

S 


13.76 


39.94 


0.14 


0.45 



For the study the correlations between each of the three predictor variables, 
ACT-C, NHSCPR, and CCGPA and the college success variable FYGPA are calculated. 
These statistics are descriptive of the restricted population for which the data are 
collected. The applicable formulas described in the preceding section are then used to 
estimate the values of the correlations for all high school graduates, those in the 
unrestricted population. 

The admission requirements for the subject university include minimum 
numbers of completed units of core subject areas of high school courses and standards 
based upon ACT-C and HSCPR. Specifically, students with an ACT-C of 24 or higher are 
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admissible and those with scores between 18 and 23 are deemed admissible on the 
basis of a sliding-scale combination of their ACT-C and HSCPR. 

In calculating the desired estimates of correlations for all high school graduates it 
is assumed that enrolled students are selected on the basis of HSCPR or NHSCPR. In 
order that the data to be analyzed conform as much as possible to this assumption, the 
study population is restricted to students whose HSCPR or NHSCPR is 50 or greater. In 
other words, this restricted population is treated as having been selected on the basis of 
HSCPR or NHSCPR. Even with this restriction, the assumption is not completely correct. 
First, the students are also selected on the basis of core course requirements and ACT-C. 
Further, the population of admitted students is further restricted to those who enroll 
full-time and complete the freshman year with complete data on the study variables. 
Consequently, the estimated correlations will be in error to some undetermined degree. 
However, the derived values are certainly more accurate estimates of the correlations 
for all high school graduates than are the values calculated for the enrolled student 
population. Gulliksen (1950) suggests that "In many cases, however, it is clear that a 
given selection test was one of the major items in the selection procedure so that the 
results found by assuming that selection was solely on the basis of the test will not be 
far from the correct estimate." 

The formulas for correcting correlations for restricted variances require that a 
standard deviation or variance of a relevant variable in the unrestricted population be 
known. For this study the unrestricted population is that of all high school graduates. 
The variance of NHSCPR, the normalized values of HSCPR, for this population is 10 2 or 
100. This is the value used to correct the prediction-of-success correlations. 

If the population of "college bound" high school graduates were of interest, the 
correlations for this population could be estimated using ACT-C scores and the variance 
of these scores in the population of ACT test-takers. For this study, however, the 
population of all high school graduates is of more interest than the population of high 
school graduates who have taken the ACT. Thus, this application of the formula is not 
explored. 
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The degree of restriction for the study population is evident in the following: In 
recent years the rate of college-going for the state in which the subject university is 
located and for the nation as a whole is around 60%. Thus, many students from the 
unrestricted population are clearly excluded from the study population. Next, 
approximately 14,500 prospective first-time freshmen applied to the subject university 
for the fall 2008, around 12,300 were admitted, and 5,800 enrolled. The study 
population of 3,668 includes those who enrolled full-time, completed both semesters, 
and had complete data on the study variables. This is a noteworthy reduction in size of 
the population and clearly suggests a reduction in the variances of study variables. 

Results 



Correlations among the several study variables for the study population 
are contained in Table 2. These are the restricted population correlations. 



Table 2. Correlations Among Study Variables 



Variable 


NHSCPR 


CCGPA 


FYGPA 


ACT-C 


0.43 


0.36 


0.43 


NHSCPR 




0.78 


0.49 


CCGPA 






0.56 



Correlation ofACT-C with FYGPA 

To correct the restricted population correlation, .43, the Case 3 formula is used. 
It is assumed that students are selected on the basis of NHSCPR, X 3 . The predictor 
variable is ACT-C, Xi, and the variable being predicted is FYGPA, X 2 . The following 
values, from Table 1 and Table 2 (with four significant digits) are used in the formula: 
r 12 = .4257, r 33 = .4281, r 23 = .4891, 

Si 2 = 13.77, s 2 2 = .4484, s 3 2 = 39.94, andS 3 2 = 100. 

The result is: R 12 = .5630. 
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Correlation of NHSCPR with FYPGA 

To correct the restricted population correlation, .49, the Case 2 formula is used. 
For this correlation it is assumed that students are selected on basis of the predictor 
variable, NHSCR, XI. The variable being predicted is FYGPA, X2. The following values, 
from Table 1 and Table 2 are used: 

r 12 = .5582, Si 2 = 39.94, s 2 2 = .4484, and 100. 

The result is: Ri 2 =.7575. 

Correlation ofCCGPA with FYGPA 

To correct this restricted population correlation, .56, the Case 3 formula is used. 
It is assumed students are selected on the basis of NFISCPR. The predictor variable is 
CCGPA, Xi, and the variable being predicted is FYGPA, X 2 . NFISCPR, X 3 , is a third variable 
related to CCGPA. The following values, from Table 1 and Table 2 are used: 
r 12 = .5582, r 13 = .7829, r 23 = .4891, 

Sl 2 =.1445, s 2 2 = .4484, s 3 2 =39.94, andS 3 2 = 100. 

The result is: Ri 2 =.8025. 



Discussion 

Table 3 contains the correlation of each predictor variable with FYGPA for the 
study population, the selected population of enrolled students, and the corresponding 
estimate of the correlation for the unrestricted population of high school graduates. 
The table also includes percentages of the variance of FYGPA that is estimated by the 
predictor variable for each correlation. It is important to emphasize that the 
unrestricted population correlations are only estimates of the values that would be 
found were it possible to calculate them directly. To an unknown degree the accuracy 
of the estimates is affected by the fact that the students in the study population were 
not selected purely on the basis of their HSCPR as assumed for the formulas used. 
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Table 3. Summary of restricted and unrestricted population 
correlations for predicting FYGPA 





Restricted Population 


Unrestricted Population 


Predictor 


r 


% of Var 


R 


% of Var 


ACT-C 


0.43 


18 


0.56 


32 


NHSCPR 


0.49 


24 


0.76 


57 


CCGPA 


0.56 


31 


0.80 


64 



There are four major findings that can be gleaned from this study. First, by 
applying the formulas and correcting the correlations, the relationship between the 
predictor variables and the measure of success in college increased significantly. From a 
formulaic perspective, if the variance for NFISCPR is 39.94 for the study population (see 
Table 1) and the variance for NFISCPR for all high school graduates is 100.0, similar 
restrictions should be realized in the variances of other study variables that are related 
to NFISCPR. These restrictions in variances indicate that the correlations in the 
population of high school graduates should be higher than those for the population of 
enrolled students. Accordingly then, the results are that the correlations increased from 
.43 to .56 for the ACT-C, from .49 to .76 for NFISCPR, and from .56 to .80 for CCGPA. By 
and large these increases in correlations are quite significant, especially for NFISCPR and 
CCGPA. This finding also reinforces the argument that the predictive ability of these 
college admission indicators is relatively robust particularly when the broader 
population of all high school graduates is considered. 

Second, the correlation of ACT-C with FYGPA in the study population is relatively 
modest at .43. For the unrestricted population, the estimated correlation of ACT-C with 
FYGPA is .56. This may not overwhelmingly justify the use of ACT-C as the sole criterion 
for college admission. At the same time, however, it is important to acknowledge that 
the percentage of variance in FYGPA explained by the ACT-C did increase by 78 percent 
(from 18% to 32%). This increase is not trivial. 

Third, and as expected, the indicators of success in high school have higher 
correlations with the college success variable than does the admission test score. The 
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differences between the correlation coefficients for these two types of predictor 
variables, .43 for ACT-C in contrast to .49 for NHSCPR and .56 for CCGPA are fairly 
substantial. As noted earlier, this finding has been shared in several other studies using 
a wide variety of subjects and institutions (Hoffman, 2002; Munro, 1981; Zheng et al., 
2002; among others). 

Fourth, the differences between the corrected and uncorrected correlations are 
greater for the high school success variables than for the admission test variable. In 
comparison to the restricted population, the unrestricted percentage of variance 
explained increases 138 percent for NHSCPR, 106 percent for CCGPA, and 78 percent for 
the ACT-C. It is not clear why the restrictions in range have a larger impact on the 
correlations of the high school success variables with FYGPA than the corresponding 
correlation involving ACT-C. Perhaps there is something about predicting college 
performance from high school performance that underlies the difference. The estimated 
correlations for the high school success predictors are significant and certainly reinforce 
the importance of these variables in college admission decisions. Furthermore, these 
findings provide a counter argument for those colleges and universities that have 
minimized the importance of or even dismissed these predictor tools in admission 
decisions. 

Further Research 

Clearly, additional research on the effect of restriction of range on the 
correlations of predictor variables with indicators of college success is needed. The 
single institution study cannot be considered to be definitive. Results for colleges or 
universities with different degrees of selectivity may differ. Further, it would be 
desirable if a situation could be found where the assumptions of the correction formula 
are met more closely than in this study. 

Conclusion 



13 

AIR 2010 Forum - Chicago, IL 




Restriction of range is clearly one reason that correlations calculated from 
enrolled student populations understate the true relationships between predictor 
variables and college success measures. The values of such correlations should not be 
the single criteria for decisions concerning the use of the predictors in college 
admissions. Other variables may also depress these correlations. For example. Pike and 
Saupe (1992) found that high school attended was a factor in the prediction of success 
in college. In that study it was found that that when high school attended was 
controlled, the correlation increased. Unreliability in the predictor and in the college 
success measures also can depress the correlations (Sackett, Borneman, and Connelly, 
2008). 

In sum, the true relationships between predictor variables and college success 
measures can be masked by restricted range as well as other extraneous variables. 

The present study demonstrates the influence that restricted range can have on this 
relationship and suggests that these predictor variables are probably more accurate 
than what is generally shared in the literature and in practice. This study will have been 
successful if it stimulates others to explore the use of the correction formulas to 
estimate correlations between predictor variables and indicators of success in college 
for unselected populations. 
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